Please ensure you have the following packages before knitting:

knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(warning = FALSE, message = FALSE) #suppresses warnings in the knit
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(httr)
library(jsonlite)
## 
## Attaching package: 'jsonlite'
## 
## The following object is masked from 'package:purrr':
## 
##     flatten
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:httr':
## 
##     config
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(scales)
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
library(randomForest)
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
## 
## The following object is masked from 'package:ggplot2':
## 
##     margin
set.seed(1)

Data Source

The following report will look at the number of cases and deaths during the COVID-19 pandemic, focusing on the United States and general global numbers. The main dataset will comprise of data from the Johns Hopkins University repository, accessed through Github.

Through this analysis, we hope to identify the rate of infections and deaths, as well as the areas most affected by the COVID-19 pandemic.

url_in <- "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/refs/heads/master/csse_covid_19_data/csse_covid_19_time_series/"

file_names <- c("time_series_covid19_confirmed_global.csv", "time_series_covid19_deaths_global.csv", "time_series_covid19_confirmed_US.csv", "time_series_covid19_deaths_US.csv")

urls <- str_c(url_in, file_names)

global_cases <- read_csv(urls[1])
global_deaths <- read_csv(urls[2])
us_cases <- read_csv(urls[3])
us_deaths <- read_csv(urls[4])

The next bit of data will come from the Centers for Disease Control and Prevention (CDC), specifically the COVID deaths grouped by state by sex for various age groups.

api_url <- "https://data.cdc.gov/resource/9bhg-hcku.json?$limit=200000"

stateDeaths_w_age_sex <- fromJSON(content(GET(api_url), "text"), flatten = TRUE)

Lastly, the political party by state for the US from their 2020 election year. The better dataset for the predictive model, seen later on, would be the population spread by age. However, the data from the United States Census database was too difficult to process for this project. For now, the political affiliation will act as an extra feature.

state_PolParty <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vS3Z8Rq9xqOLISwoKdK0n6CFLBuPSCoXbbLeY8vhi-rzFS3ZFNEtR0BCdEbHcS-2Tlh5aPcnZbwBLao/pub?output=csv")

Tidying the Data

Cleaning

For US Cases and US Deaths, remove unnecessary columns 1-11 and 1-12 respectively. Rename Admin2 to County, iso2 to Country_Short (country abbreviation), fips to FIPS (county code), and cases to deaths in the US Deaths dataset. Lastly, convert the “dates” column from character to date type.

us_cases <- us_cases %>%
  pivot_longer(
    cols = !c(1:11),
    names_to = "dates",
    values_to = "cases",
    values_transform = as.numeric
  ) %>%
  select(-UID, -(iso3 : code3), -(Country_Region : Combined_Key)) %>%
  rename(County = Admin2, Country_Short = iso2, fips = FIPS) %>%
  mutate(dates = mdy(dates))
  
us_deaths <- us_deaths %>%
  pivot_longer(
    cols = !c(1:12),
    names_to = "dates",
    values_to = "cases",
    values_transform = as.numeric
  ) %>%
  select(-UID, -(iso3 : code3), -(Country_Region : Combined_Key)) %>%
  rename(County = Admin2, Country_Short = iso2, fips = FIPS, deaths = cases) %>%
  mutate(dates = mdy(dates)) %>%
  filter(!Population %in% c(0), !is.na(Population))

summary(us_cases)
##  Country_Short           fips          County          Province_State    
##  Length:3819906     Min.   :   60   Length:3819906     Length:3819906    
##  Class :character   1st Qu.:19077   Class :character   Class :character  
##  Mode  :character   Median :31012   Mode  :character   Mode  :character  
##                     Mean   :33043                                        
##                     3rd Qu.:47130                                        
##                     Max.   :99999                                        
##                     NA's   :11430                                        
##      dates                cases        
##  Min.   :2020-01-22   Min.   :  -3073  
##  1st Qu.:2020-11-02   1st Qu.:    330  
##  Median :2021-08-15   Median :   2272  
##  Mean   :2021-08-15   Mean   :  14088  
##  3rd Qu.:2022-05-28   3rd Qu.:   8159  
##  Max.   :2023-03-09   Max.   :3710586  
## 
summary(us_deaths)
##  Country_Short           fips          County          Province_State    
##  Length:3688461     Min.   :   60   Length:3688461     Length:3688461    
##  Class :character   1st Qu.:19023   Class :character   Class :character  
##  Mode  :character   Median :30018   Mode  :character   Mode  :character  
##                     Mean   :31337                                        
##                     3rd Qu.:46103                                        
##                     Max.   :72153                                        
##                     NA's   :1143                                         
##    Population           dates                deaths       
##  Min.   :      86   Min.   :2020-01-22   Min.   :    0.0  
##  1st Qu.:   11137   1st Qu.:2020-11-02   1st Qu.:    5.0  
##  Median :   26205   Median :2021-08-15   Median :   40.0  
##  Mean   :  103153   Mean   :2021-08-15   Mean   :  189.5  
##  3rd Qu.:   67493   3rd Qu.:2022-05-28   3rd Qu.:  126.0  
##  Max.   :10039107   Max.   :2023-03-09   Max.   :35545.0  
## 

Global Cases and Global Deaths needed their dates columns unpivoted and renamed to “dates” and “cases”. Rename two columns to get rid of the “/”, convert the dates column from character to date type, and remove the “Lat” and “Long” columns.

global_cases <- global_cases %>%
  pivot_longer(
    cols = !c(1:4),
    names_to = "dates",
    values_to = "cases",
    values_transform = as.numeric
  ) %>%
  rename(Prov_State = 'Province/State', Country_Region = 'Country/Region') %>%
  mutate(dates = mdy(dates)) %>%
  select(-Lat, -Long)
  
global_deaths <- global_deaths %>%
  pivot_longer(
    cols = !c(1:4),
    names_to = "dates",
    values_to = "cases",
    values_transform = as.numeric
  ) %>%
  rename(Prov_State = 'Province/State', Country_Region = 'Country/Region', deaths = cases) %>%
  mutate(dates = mdy(dates)) %>%
  select(-Lat, -Long)

summary(global_cases)
##   Prov_State        Country_Region         dates                cases          
##  Length:330327      Length:330327      Min.   :2020-01-22   Min.   :        0  
##  Class :character   Class :character   1st Qu.:2020-11-02   1st Qu.:      680  
##  Mode  :character   Mode  :character   Median :2021-08-15   Median :    14429  
##                                        Mean   :2021-08-15   Mean   :   959384  
##                                        3rd Qu.:2022-05-28   3rd Qu.:   228517  
##                                        Max.   :2023-03-09   Max.   :103802702
summary(global_deaths)
##   Prov_State        Country_Region         dates                deaths       
##  Length:330327      Length:330327      Min.   :2020-01-22   Min.   :      0  
##  Class :character   Class :character   1st Qu.:2020-11-02   1st Qu.:      3  
##  Mode  :character   Mode  :character   Median :2021-08-15   Median :    150  
##                                        Mean   :2021-08-15   Mean   :  13380  
##                                        3rd Qu.:2022-05-28   3rd Qu.:   3032  
##                                        Max.   :2023-03-09   Max.   :1123836

The CDC data had summary rows, so removed them by filtering out “All Ages”, “All Sexes”, and “United States”. Remove unnecessary columns, and convert “month”, “year” and “covid_19_deaths” to numeric type.

stateDeaths_w_age_sex <- stateDeaths_w_age_sex %>%
    select(-(data_as_of:group), -(total_deaths:footnote)) %>%
    filter(!sex %in% c('All Sexes'), !age_group %in% c('All Ages'), !state %in% c('United States'), !covid_19_deaths %in% c(0)) %>%
    drop_na(year, month, covid_19_deaths) %>%
    mutate(month = as.numeric(month), year = as.numeric(year), covid_19_deaths = as.numeric(covid_19_deaths))

summary(stateDeaths_w_age_sex)
##     state               sex             age_group         covid_19_deaths  
##  Length:18282       Length:18282       Length:18282       Min.   :  10.00  
##  Class :character   Class :character   Class :character   1st Qu.:  17.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :  32.00  
##                                                           Mean   :  74.23  
##                                                           3rd Qu.:  73.00  
##                                                           Max.   :2944.00  
##       year          month       
##  Min.   :2020   Min.   : 1.000  
##  1st Qu.:2020   1st Qu.: 3.000  
##  Median :2021   Median : 7.000  
##  Mean   :2021   Mean   : 6.609  
##  3rd Qu.:2022   3rd Qu.:10.000  
##  Max.   :2023   Max.   :12.000

Political party data set was filtered for only the by-state information.

state_PolParty <- state_PolParty %>%
    select(state, called) %>%
    filter(
      !state %in% c("U.S. Total", "15 Key Battlegrounds", "Non-Battlegrounds"), 
      !grepl("1st District", state), 
      !grepl("2nd District", state), 
      !grepl("3rd District", state)
      )

Join and Summarize

From my understanding, the deaths column isn’t a deaths by day, but more of a total deaths as of that day. Strangely, when calculating the delta, there are areas of negative deaths. While this is possible for the cases column, where some people have COVID (positive delta) and some recover (negative delta) or died (negative delta), there shouldn’t be instances of negative death (someone comes back to life). I will leave the negatives. If it was incorrectly entered as a new death and then retracted, the negative will cancel out the addition.

However, because of the lag function, the delta column is incorrect in the following case: row x has County-A Dec 31st, 2022 with 500 deaths, but row x=1 is now County-B Jan 01, 2020 with 0 deaths, the delta logs this as -500 deaths. Got around this issue by using ifelse to check for the changes (i.e. if there is a change, the deaths delta is set to zero).

usCovid_delta <- us_cases %>%
    full_join(us_deaths) %>%
    filter(cases != 0) %>%
    mutate( PrevCases = lag(cases, n = 1), 
            PrevDeaths = lag(deaths, n = 1),
            NewCases = ifelse( 
                        County == lag(County, n = 1),
                        cases - PrevCases,
                        0),
            NewDeaths = ifelse( 
                        County == lag(County, n = 1),
                        deaths - PrevDeaths,
                        0)
            ) %>%
    select(-PrevCases, -PrevDeaths)
global <- global_cases %>%
  full_join(global_deaths) %>%
  group_by(Prov_State, Country_Region, year(dates), month(dates)) %>%
  summarise(cases = max(cases), deaths = max(deaths))
#use this for the modeling

stateDeaths_w_age_sex <- stateDeaths_w_age_sex %>%
    full_join(state_PolParty)

Visuals and Analysis

The following is a heatmap showing the mean number of deaths by age by state for the US. Texas and California stand out, specifically in the higher age groups.

ageGroup_heatmap <- stateDeaths_w_age_sex %>%
  group_by(state, age_group) %>%
  summarise(mean_deaths = mean(covid_19_deaths))

plot_ly(
  ageGroup_heatmap, 
  x = ~age_group, 
  y = ~state, 
  z = ~mean_deaths, 
  type = "heatmap"
  ) %>% 
  layout(title = 'Mean Deaths by State')

Next graph showcases the number of cases and deaths in the United States as a whole, from Jan 2020 to March 2023.

#From what I've seen in the data, case and death counts start in march for most states

usCovid_delta_total <- usCovid_delta %>%
  filter(Country_Short == "US") %>%
  group_by(Year = year(dates), Month = month(dates)) %>%
  summarise(cases_delta = sum(NewCases, na.rm=T), deaths_delta = sum(NewDeaths, na.rm=T))

ggplot(usCovid_delta_total) +
  geom_line(
    aes(
      x = as.factor(Month), 
      y = cases_delta, 
      group = as.factor(Year), 
      colour = "Cases"), 
    linewidth = 1) +
  geom_line(
    aes(
      x = as.factor(Month), 
      y = deaths_delta, 
      group = as.factor(Year), 
      colour = "Deaths"), 
    linewidth = 1) +
  facet_grid(.~Year, scales = "free") +
  scale_y_continuous(trans = "log10", labels = comma) +
  labs(x = "Month",y = "Volume") + 
  ggtitle("COVID Cases and Deaths in the USA") +
  scale_color_manual(name = "COVID", values = c("Cases" = "blue", "Deaths" = "red"))

Create the same graph for COVID numbers globally.

global_delta_total <- global_cases %>%
    full_join(global_deaths) %>%
    filter(cases != 0) %>%
    mutate( PrevCases = lag(cases, n = 1), 
            PrevDeaths = lag(deaths, n = 1),
            NewCases = ifelse( 
                        (Prov_State == lag(Prov_State, n = 1) | is.na(Prov_State)) & Country_Region == lag(Country_Region, n = 1),
                        cases - PrevCases,
                        0),
            NewDeaths = ifelse( 
                        (Prov_State == lag(Prov_State, n = 1) | is.na(Prov_State))  & Country_Region == lag(Country_Region, n = 1),
                        deaths - PrevDeaths,
                        0)
            ) %>%
    select(-PrevCases, -PrevDeaths) %>%
    filter(!((Country_Region == "France" & is.na(Prov_State)) | (Country_Region == "United Kingdom" & is.na(Prov_State))) & dates != 2020-01-22) %>%
    group_by(year = year(dates), month = month(dates)) %>%
    summarise(cases_delta = sum(NewCases, na.rm=T), deaths_delta = sum(NewDeaths, na.rm=T))

#Strange happenings: the lag function in the following code works for everything row except where United Arab Emirates becomes United Kingdom, and Finland becomes France. I cannot find the issue, and trying the Lag function from Hmisc has the same issue. Since they are the only two instances, I have purposely removed them.

ggplot(global_delta_total) +
  geom_line(aes(x = as.factor(month), y = cases_delta, group = as.factor(year), colour = "Cases"), linewidth = 1) + 
  geom_line(aes(x = as.factor(month), y = deaths_delta, group = as.factor(year), colour = "Deaths"), linewidth = 1) +
  facet_grid(.~year, scales = "free") +
  scale_y_continuous(trans = "log10", labels = comma) +
  labs(x = "Month",y = "Volume") + 
  ggtitle("COVID Cases and Deaths Globally") +
  scale_color_manual(name = "COVID", values = c("Cases" = "blue", "Deaths" = "red"))

Globally, 2022 had the highest number of cases, but the largest gap between cases and deaths compared to the other years.At this time, many countries were well into the vaccine roll out, as observed by the World Health Organization. This could potentially explain how the case numbers jump at the beginning of 2022 (vaccination allowed for more socializing and fewer lock down restrictions, as seen here https://ourworldindata.org/covid-vaccinations)

From the heatmap, California, Texas and Florida had the highest mean deaths in the ageing population. They are the three most populous states of the US, as seen in the table below.

us_pops_byState <- us_deaths %>%
    filter(Country_Short == "US") %>%
    group_by(County, Province_State) %>%
    summarise(Population = mean(Population)) %>%
    group_by(Province_State) %>%
    summarise( total_pop = sum(Population, na.rm=T)) %>%
    arrange(desc(total_pop)) %>%
    head(10)

knitr::kable(us_pops_byState, "simple", format.args = list(big.mark = ",",
  scientific = FALSE), col.names = c("State", "Total Population"))
State Total Population
California 39,512,223
Texas 28,995,881
Florida 21,477,737
New York 19,453,561
Pennsylvania 12,801,989
Illinois 12,671,821
Ohio 11,689,100
Georgia 10,617,423
North Carolina 10,488,084
Michigan 9,986,857

As seen in the table below, the counties with the highest single day increase in COVID cases and deaths were warmer-climate counties, where the population is more likely to be outdoors.

us_singleDayInc <- usCovid_delta %>%
    filter(Country_Short == "US", !is.na(Population)) %>%
    group_by(County, Province_State) %>%
    summarise(
      Population = mean(Population),  
      high_NewCases = max(NewCases, na.rm = T), 
      high_NewDeaths = max(NewDeaths, na.rm = T)) %>%
    arrange(desc(high_NewCases), desc(high_NewDeaths)) %>%
    head(10)

knitr::kable(us_singleDayInc, "simple", format.args = list(big.mark = ",",
  scientific = FALSE), col.names = c("County", "State", "Population", "Max New Cases", "Max New Deaths"))
County State Population Max New Cases Max New Deaths
Miami-Dade Florida 2,716,940 110,441 2,806
San Diego California 3,338,330 52,300 79
Broward Florida 1,952,778 50,254 1,913
Los Angeles California 10,039,107 45,553 928
Cook Illinois 5,150,233 41,289 167
Maricopa Arizona 4,485,414 34,764 318
Palm Beach Florida 1,496,770 34,340 1,446
Orange Florida 1,393,452 30,752 988
Clark Nevada 2,266,715 27,876 356
Orange California 3,175,692 25,439 110
us_deathsBy_sex <- stateDeaths_w_age_sex %>%
    group_by(sex) %>%
    summarise(total_deaths = sum(covid_19_deaths)) %>%
    mutate(deci = total_deaths / sum(total_deaths)) %>% 
    mutate(perc = scales::percent(deci))

us_deathsBy_age_group <- stateDeaths_w_age_sex %>%
  group_by(age_group) %>%
  summarise(total_deaths = sum(covid_19_deaths)) %>%
  mutate(deci = total_deaths / sum(total_deaths)) %>% 
  mutate(perc = scales::percent(deci))

Diving deeper into the sex of those who died from COVID in the US, 43% were female (589,372 total) and 57% were male (767,725 total). Compare this to the total US population where 50.25% is male, 49.75% is female (as of 2023, World Bank Group https://data.worldbank.org/indicator/SP.POP.TOTL.FE.ZS?locations=US&view=map&year=2023)

Deaths by age group highlights the fact that people past retirement age are high-risk for infections and viruses. Approximately 63% of the deaths were people 65 years and older.

Models

stateDeaths_w_age_sex <- na.omit(stateDeaths_w_age_sex)

model <- randomForest(
  formula = covid_19_deaths ~ ., 
  data = stateDeaths_w_age_sex,
  mtry = 6
  )

print(model)
## 
## Call:
##  randomForest(formula = covid_19_deaths ~ ., data = stateDeaths_w_age_sex,      mtry = 6) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 6
## 
##           Mean of squared residuals: 1957.847
##                     % Var explained: 89.38
stateDeaths_w_age_sex <- na.omit(stateDeaths_w_age_sex) %>%
  mutate(pred = round(predict(model),2))

The mean squared residuals of our score means the model was off by this many deaths on average. The lower this number and the higher the % variance, the better.

#Compare model to whole US
us_v_Pred <- stateDeaths_w_age_sex %>% 
  group_by(year, month) %>%
  summarise(deaths = sum(covid_19_deaths, na.rm = T), pred = sum(pred, na.rm = T))

ggplot(us_v_Pred) +
  geom_line(aes(x = as.factor(month), y = deaths, group = as.factor(year), colour = "Actuals"), linewidth = 1) + 
  geom_line(aes(x = as.factor(month), y = pred, group = as.factor(year), colour = "Model"), linewidth = 1) +
  facet_grid(.~year, scales = "free") +
  scale_y_continuous(trans = "log10", labels = comma) +
  labs(x = "Month",y = "Volume") + 
  ggtitle("COVID Deaths in the US: Prediction vs Actuals") +
  scale_color_manual(name = "COVID", values = c("Actuals" = "steelblue", "Model" = "orange3"))

Visually, the model looks very close to the actuals. The next graph focuses on the state of California and will show it is not as clean when the model for the US as a whole is used state by state.

#compare model to actual for California
Calif_v_Pred <- stateDeaths_w_age_sex %>% 
    filter(state == "California") %>%
    group_by(state, year, month) %>%
    summarise(deaths = sum(covid_19_deaths, na.rm = T), pred = sum(pred, na.rm = T))

ggplot(Calif_v_Pred) +
  geom_line(aes(x = as.factor(month), y = deaths, group = as.factor(year), colour = "Actuals"), linewidth = 1) + 
  geom_line(aes(x = as.factor(month), y = pred, group = as.factor(year), colour = "Model"), linewidth = 1) +
  facet_grid(.~year, scales = "free") +
  scale_y_continuous(trans = "log10", labels = comma) +
  labs(x = "Month",y = "Volume") + 
  ggtitle("COVID Deaths in California: Prediction vs Actuals") +
  scale_color_manual(name = "COVID", values = c("Actuals" = "steelblue", "Model" = "orange3"))

Biases

Sampling Bias: not having the same amount of info for other countries (and as easily available) as for the USA. Some countries may have been unable to or unwilling to properly report COVID cases and deaths.

As well, choosing to import the political party by state instead of the religious demographic spread or another feature can be seen as a bias.

Preprocessing bias: There is only so much data I can compile. At some point, I wanted to import the US census data, but the output had variable codes instead of the text names and would have taken much more time to clean. To avoid going down a rabbit hole and wasting time, I had to close off that route.

Appendix

sessionInfo()
## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8   
## [3] LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C                   
## [5] LC_TIME=English_Canada.utf8    
## 
## time zone: America/Toronto
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] randomForest_4.7-1.2 scales_1.3.0         plotly_4.10.4       
##  [4] jsonlite_1.8.9       httr_1.4.7           lubridate_1.9.4     
##  [7] forcats_1.0.0        stringr_1.5.1        dplyr_1.1.4         
## [10] purrr_1.0.2          readr_2.1.5          tidyr_1.3.1         
## [13] tibble_3.2.1         ggplot2_3.5.1        tidyverse_2.0.0     
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.9        generics_0.1.3    stringi_1.8.4     hms_1.1.3        
##  [5] digest_0.6.37     magrittr_2.0.3    evaluate_1.0.3    grid_4.4.2       
##  [9] timechange_0.3.0  fastmap_1.2.0     crosstalk_1.2.1   viridisLite_0.4.2
## [13] lazyeval_0.2.2    jquerylib_0.1.4   cli_3.6.3         rlang_1.1.5      
## [17] crayon_1.5.3      bit64_4.6.0-1     munsell_0.5.1     withr_3.0.2      
## [21] cachem_1.1.0      yaml_2.3.10       parallel_4.4.2    tools_4.4.2      
## [25] tzdb_0.4.0        colorspace_2.1-1  curl_6.2.0        vctrs_0.6.5      
## [29] R6_2.5.1          lifecycle_1.0.4   htmlwidgets_1.6.4 bit_4.5.0.1      
## [33] vroom_1.6.5       pkgconfig_2.0.3   pillar_1.10.1     bslib_0.8.0      
## [37] gtable_0.3.6      glue_1.8.0        data.table_1.16.4 xfun_0.50        
## [41] tidyselect_1.2.1  rstudioapi_0.17.1 knitr_1.49        farver_2.1.2     
## [45] htmltools_0.5.8.1 rmarkdown_2.29    compiler_4.4.2

Citations

National Center for Health Statistics. Provisional COVID-19 Deaths by County, and Race and Hispanic Origin. Date accessed [2025-02-09]. https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-in-the-United-St/kn79-hsxy/data_preview

Johns Hopkins Center for Systems Science and Engineering. Novel Coronavirus (COVID-19) Cases. Date accessed [2025-02-09] https://github.com/CSSEGISandData/COVID-19

The Cook Political Report (with Amy Walter). Popular Vote Backend. Date accessed [2025-02-09]. https://www.cookpolitical.com/vote-tracker/2020/electoral-college